Statistical Corpus Foundations era
W. Nelson Francis and Henry Kučera laid the statistical groundwork for corpus inference by compiling and using the Brown Corpus, demonstrating how large-scale word frequencies can model distribution and variability. John Sinclair popularized corpus-based lexicography in the 1980s and 1990s, showing that collocations and distributional patterns in corpora reveal meaning and usage beyond isolated items. Klaus Schütze helped move corpus linguistics toward formal statistical methods in the 1990s, developing computational models for language data, association measures, and language modeling within corpora. Tony McEnery, along with colleagues, anchored methodological concerns in corpus studies, including sampling design, corpus size, representativeness, and annotation practices, framing corpus inference as a distributional problem dependent on corpus structure.